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Abstract 

Prior research indicates that language-handicapped children 
obtain higher test scores when tested by personally familiar examiners 
than when tested by personally unfamiliar examiners. The present 
investigation inquired whether this finding is due to examinees' 
actual differential performance across^ the two examiner conditions, or 
whether it is the result of testers' biased scoring of similar 
examinee performances. To make this determination, videotaped testing 
sessions, in which language-handicapped preschoolers were awarded 
higher scores by familiar examiners than by unfamiliar examiners, were 
shown to two certified speech clinicians' who were blind to all 
purposes of the study. These individuajs rated each examinee's 
performance in the familiar, and unfamiliar examiner condition. 
Results indicated that the videotape raters, as the examiners, gave 
higher scores to examinees' performance in the familiar condition, 
corroborating the notion that language-impaired children actually 
perform more strongly with a familiar examiner. 
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The Importance of Scorer Bias to Handicapped Preschoolers' 
Stronger Performance with Fsimiliar Examiners 

During the past four years, Fuchs and associates conducted a 
program of research into the effects of examiner familiarity on the 
performance of language-impaired children. Findings indicated that 
these children performed more strongly when they were assessed by 
familiar testers than when they were tested by strange examiners. 
More specifically, this differential performance was obtained (a) when 
testers were inexperienced and also when they were professional speech 
clinicians (Fuchs, Fuchs, Dailey, & Power, 1983), (b) across studies 
employing experimentally-induced (e.g., Fuchs, "Fuchs, Power, & Dailey, 
in press) and long-term acquaintanceship (e.g., Fuchs, Fuchs, Garwick, 
& Featherstone, 1983) definitions of examiner familiarity, (c) over 
Various levels of item difficulty and response modes (Fuchs, 
Featherstone, Garwick, & Fuchs, in press), snd (d) across presC|:00l 
and school-age language-impaired children (Fuchs, Fuchs, Power, & 
Dailey, 1983). Finally, ^his program of research demonstrated that 
the personal unfamil iarity of a tester not only discourages language- 
impaired children's optimal, absolute performance but also selectively 
deoressfes their performance relative to nonhandicapped children 
(Fuchs, Fuchs, Power, & Dailey, 1983). 

Therefore, it appears that the effect of a tester's professional 
unfamiliarity prevails across a range of sit-iations and- that the use 
of unfamiliar examiners represents systematic bias against and 
threatens the validity of the test performance of certain handicapped 
children. The salience of these findings is underscored by the facts 
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that children typically are assessed by strange testers, most test 
manuals do not prescribe pretest contact between examiners and 
examinees (cf. Fuchs, Fuchs, Daiiey, & Power, 1983), and test results 
are used pervasively for making decisions about educational programs 
and student classification. 

Given the potentially negative, far-reaching implications that 
examiner unfamil iarity has for educational practice, it seems 
important to explore how and why a tester's strangeness affects 
certain examinees' performance. As a beginning, it may be useful to.- 
recognize an assumption that has been made explicitly and repeatedly 
in the p^wer thus far (as well as in all of the pertinent research to 
date): namely, that language-impaired children perform more strongly 
in the familiar condition. It is possible, of course, that examiner 
familiarity does not affect the level of examinees' respondir.g but 
rather influences the accuracy of testers' judgment and scoring. A 
large and enduring literature on rater bias supports this latter 
possibility (e.g., Guilford, 1936; Rosenthal, 1980). In an effort to 
become clearer about the nature of examiner familiarity effects, the 
present study explored the impact of personal familiarity on testers' 
accuracy of scoring, 

Msthod 

Subjects 

Subjects were 22 (17 M, 5 F) Caucasian preschool children. The X" 
and SD for their CA were 58.32 and 8.70 months, respectively. They 
came from predominantly middle class homes in Centr/l Massachusetts, 
were moderately to profoundly language-impaired, and attended a public 
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special education preschool program. All subjects performed within 
the normal range on individually administered intelligence tests. 
E xaminers 

Examiners were' 22 Caucasian female graduate students at a state 
college, and employees of public and private schools in Central 
Massachusetts. (See Fucl^s, Kuchs, Dailey, & Power, 1983, for examiner 
selection procedure.) Eleven examiners werp early childhood educators 
(ECEs),.who had an average 96.00 months (SD = 59.28) teaching 
experience. None had formal training or professional, experience with 
either assessment or handicapped children. For this reason, they were 
conceptualized as the "inexperienced" examiner group. 

The other 11 examiners were speech clinicians (SCs) who had been 
practicing professionals for an average 85.09 months (SD = 74.06). By 
virtue of their professional experience and formal tr-r^.ining addressing 
both assessment and language-handicapped youngsters, the /cs were 
assigned "experienced" examiner status. The. two examiner groups were 
similar with respect to the amount of their respective work 
experiences, ^(20) = .38, ns. 
Measure 

The Preschool Language Scale, verbal expression scale fVE; 
Zimmerman, Steiner, & Pond, 1979) was employed. limmerman et a1 . 
reported split-half reliability coefficients ranging from .75 to .95, 
with a median of .88 on the total test. Using the Spearman-Brown 
Prophecy formula, reliability for the VE was estimated at .79. 
- Design 

Children were assigned randomly to SC and ECE groups. There was 
no difference between the two groups with respect to the/children' s 
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CA, t(20) = .75, ns, or sex, x^d) = 0.00, ns. W1th1n examiner 
groups, each child was assigned randomly to two examiners, one with 
whom he or she became personally familiar and one to whom he or she. 
remained a stranger. The study required each examiner to serve in 
both familiar and unfamiliar roles, thereby controlling for 
potenti Jly confounding effects of tester personality. Each child was 
assessed twice during a period of three weeks, once by the familiar 
and once by the unfamiliar tester, within a crossover design: One- 
half of both ECE and SC examiners first tested familiar children, then 
unfamiliar children; the remaining examiners tested their examinees in 
reverse order. All testing occurred in the preschooVs speech therapy 
room, a setting with which all children were familiar. 
Procedure 

Personal familiarity . Examiners' personal familiarity was 
induced experimentally by two procedures. Every tester was required 
to make a one-hour home visit. Examiners were told that there were 
two purposes for this visit: first, "to get to know the child and to 
permit the child to get to know you"; second, "to obtain information 
about the child from the mother." Accordingly, each tester^ was 
instructed to take materials with which to play with her future 
examinee and to administer to the child's mother a structured 
interview that briefly explored the child's general functioning and 
likes and dislikes. (Although scored and returned to the 
investigators, the interview data were not subjected to analyses; the 
only purpose for the interview was to acquaint the examiner with the 
child.) ■ - 
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The second strategy to induce personal familiarity was to require 
each tester to play with her "familiar" child for one hour immediately 
preceding the tester session. The play occurred both in and outside 
of the child's classroom. For this encounter, the tester provided the 
same materials with which she and the child had played during the home 
visit. The play outside of the child's room always followed the 
classroom interaction; the preschool encdunters always followed the 
examiners' home visits. The lapse in time separating the home visit 
and testing ranged between two and eight days. 

Training. ECE and SC examiners were trained separately to 

i \ 

administer the VE scale. The ECEs received a total of five hours of; 

instruction in two sessions. The SCs met for one session that lasted 

2h hours. A certified speec|i clinician conducted all training. 

Videotaping . The students' test performance with familiar and 

unfamiliar examiners was videotaped with two AVC 3200 Sony video 

cameras on one-half inch videotape. The cameras, connected to a Sony 

3600 recorder, were placed behind the examiner and examinee. With the 

aid of a special effects generator (SEG-l^ a split screen was created 

displaying a frontal view of the upper torsos and heads of both 

participants. Examiners were informed of the recording; examinees 

were not, j 

Scoring . Examinees' performances on the VE, in both familiar and 
unfamiliar testing conditions, were scored using two procedures. 
First, investigators summed examiners' protocols that had been 
completed during testing, using' a blind procedure so that 
investigators were unaware of examinees' names or testing conditions. 
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For the second scoring, two female certified speech clinicians, who 
did not know (a) any of the examiners or examinees, (b) the purpose of 
the study, or (c) the testing conditions they viewed, completed new 
protocols as they watched the videotaped testing sessions. These 
raters scored equal numbers of SC and ECE examiner testing sessions. 
One rater observed 45?^ and 55% of familiar and unfamiliar testing 
sessions, respectively, with the remaining sessions scored by the 
other rater. Interscorer agreement,^ calcalated on 18^ of the 
testing, ranged betweerj .91 and .96. Later, the second set of 
protocols was summed by investigators using a blind procedure. 

\ R esults 

\ ^ — ■ 

A preliminary one between (SC' vs. ECE), one within (personally 

familiar vs. unfamiliar) analysis of variance (ANOVAO was run on the 

VE scores (Winer, 1971). llhis ANOVA yielded one significant effect 

for personal familiarity vsV unfamiliarity, F(l,20) = 3.56, £ < .05. 

\ i 
Across experienced and Inexperienced examiner conditions, subjects 

performed an average 4.11 points higher when tested by personally 
familiar examiners. i 

Next,^ for each child's performance, both in familiar and 
unfamiliar testing conditions, • a difference score was calculated 
between the examiner's VE score and the videotape rater's VE score. 
These difference scores, indicative of the examiners' scoring 
accuracy, were entered into a one^ within (personally familiar vs. 
unfamiliar) ANOVA, which revealed^ no statistically significant 
difference in the scoring accuracy of^ examiners ■ between familiar and 
unfamiliar testing conditions; F(l,21) ' = .46, ns. The average 



differences between examiners' and videotaped raters' scores were 4.77 
(SD = 5.83) and' 3.61 (SD « 5.79), in the familiar and unfamiliar 
conditions, respectively. 

Oiscussion 

Results indicate that a nonsignificant disparity was generated by 
contrasting examiners' vs. videotape raters' scores in the familiar 
testing condition with those in the unfamiliar test setting. Thus,, 
the videotape raters also obtained higher scores for the examinees* 
performance in the familiar condition. Because these raters knew 
nothS'ng about the study's objectives or participants, the findings 
seem' to support the notion that examinees actually performed 
differently across the two experimental conditions, rather than' 
performing similarly and receiving differential scores by biased 
testers. 

Although we found no evidence indicating that examiners '^^ biased 
scoring was responsible for examinees' differential performance, we do 
not v/ish to imply that an examinee' s , performance is independeitt of 

-tester behavior. A previous study (Fuchs, Zern^^ Fuchs, in 5j;^ss-a; 
Fuchs, Zern, & Fuchs, in press-b) demonstrated *^n association between 
children's differential verbal production\in familiar vs. unfamiliar 
examiner conditions and examiner behavior. In the familiar condition, 
examinees spoke longer, more often, and with greater syntactic and 
semantic complexity; familiar examiners (a) exercised more frequent 

-,and longer intervals of silence than unfamiliar examiners, (b) often 
used eye contact with examinees as a cue in deciding when to speak, 
whereas unfamiliar examiners rarely utilized this cue, (c) employed 



largely directive language in contrast to unfamiliar examiners' speech 
that more frequently was participatory in nature, and (d) spoke for 
shorter duration's than unfamiliar examiners. 

Thus,, this study and' previous related investigations suggest that 
the situational factor, examiner familiarity, affects both examiner 
and examinee behavior in dramatic and educationally significant ways. 
'Examiner trainers, test developers and publishers, professional groups 
that are. responsible for establishing and monitoring testing 
standards, researchers, and users of test findings should consider 
more seriously the role of tester familiarity and,- simultaneously, 
begin to question- the possible importance of additional, unexplored 
situational factors in the test situation to children's performance. 
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